NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fluid Language Model Benchmarking

Hofmann, Valentin; Heineman, David Heineman; Magnusson, Ian; Lo, Kyle; Dodge, Jesse; Sap, Maarten; Koh, Pang Wei; Wang, Chun; Hajishirzi, Hannaneh; Smith, Noah A (October 2025, Conference on Language Modeling)

Full Text Available
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Mireshghallah, Niloofar; Kim, Hyunwoo; Zhou, Xuhui; Tsvetkov, Yulia; Sap, Maarten; Shokri, Reza; Choi, Yejin (May 2024, International Conference on Learning Representations)

Existing efforts on quantifying privacy implications for large language models (LLMs) solely focus on measuring leakage of training data. In this work, we shed light on the often-overlooked interactive settings where an LLM receives information from multiple sources and generates an output to be shared with other entities, creating the potential of exposing sensitive input data in inappropriate contexts. In these scenarios, humans nat- urally uphold privacy by choosing whether or not to disclose information depending on the context. We ask the question “Can LLMs demonstrate an equivalent discernment and reasoning capability when considering privacy in context?” We propose CONFAIDE, a benchmark grounded in the theory of contextual integrity and designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. CONFAIDE consists of four tiers, gradually increasing in complexity, with the final tier evaluating contextual privacy reasoning and theory of mind capabilities. Our experiments show that even commercial models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively, highlighting the urgent need for a new direction of privacy-preserving approaches as we demonstrate a larger underlying problem stemmed in the models’ lack of reasoning capabilities.
more » « less
Full Text Available
SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents

https://doi.org/10.18653/v1/2024.acl-long.698

Wang, Ruiyi; Yu, Haofei; Zhang, Wenxin; Qi, Zhengyang; Sap, Maarten; Bisk, Yonatan; Neubig, Graham; Zhu, Hao (January 2024, Association for Computational Linguistics)

Full Text Available
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Zhou, Xuhui; Zhu, Hao; Mathur, Leena; Zhang, Ruohong; Yu, Haofei; Qi, Zhengyang; Morency, Louis-Philippe; Bisk, Yonatan; Fried, Daniel; Neubig, Graham; et al (January 2024, The Twelfth International Conference on Learning Representations)

Full Text Available
NLPositionality: Characterizing Design Biases of Datasets and Models

https://doi.org/10.18653/v1/2023.acl-long.505

Santy, Sebastin; Liang, Jenny; Le Bras, Ronan; Reinecke, Katharina; Sap, Maarten (January 2023, Association for Computational Linguistics)
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

https://doi.org/10.18653/v1/2023.findings-acl.392

Zhou, Xuhui; Zhu, Hao; Yerukola, Akhila; Davidson, Thomas; Hwang, Jena D.; Swayamdipta, Swabha; Sap, Maarten (January 2023, Findings of the Association for Computational Linguistics: ACL 2023)

Full Text Available
Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines

https://doi.org/10.18653/v1/2022.acl-long.222

Gabriel, Saadia; Hallinan, Skyler; Sap, Maarten; Nguyen, Pemi; Roesner, Franziska; Choi, Eunsol; Choi, Yejin (January 2022, ACL)

Full Text Available
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

https://doi.org/10.18653/v1/2021.emnlp-main.397

Baheti, Ashutosh; Sap, Maarten; Ritter, Alan; Riedl, Mark (January 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
Challenges in Automated Debiasing for Toxic Language Detection

https://doi.org/10.18653/v1/2021.eacl-main.274

Zhou, Xuhui; Sap, Maarten; Swayamdipta, Swabha; Choi, Yejin; Smith, Noah (January 2021, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume)

Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English). Our comprehensive experiments establish that existing methods are limited in their ability to prevent biased behavior in current toxicity detectors. We then propose an automatic, dialect-aware data correction method, as a proof-of-concept. Despite the use of synthetic labels, this method reduces dialectal associations with toxicity. Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.
more » « less
Full Text Available
Challenges in Automated Debiasing for Toxic Language Detection

https://doi.org/10.18653/v1/2021.eacl-main.274

Zhou, Xuhui; Sap, Maarten; Swayamdipta, Swabha; Choi, Yejin; Smith, Noah A. (January 2021, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume)

Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English). Our comprehensive experiments establish that existing methods are limited in their ability to prevent biased behavior in current toxicity detectors. We then propose an automatic, dialect-aware data correction method, as a proof-of-concept. Despite the use of synthetic labels, this method reduces dialectal associations with toxicity. Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.
more » « less
Full Text Available

« Prev Next »

Search for: All records